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Abstract 

Huffman coding finds a prefix code that minimizes mean codeword length for a given probability distribution 
over a finite number of items. Campbell generalized the Huffman problem to a family of problems in which the 
goal is to minimize not mean codeword length YliPi^i ^ ut rather a generalized mean of the form ip~ (X/iP'VO*))' 
where U denotes the length of the ith codeword, pi denotes the corresponding probability, and tp is a monotonically 
increasing cost function. Such generalized means — also known as quasiarithmetic or quasilinear means — have a 
number of diverse applications, including applications in queueing. Several quasiarithmetic-mean problems have novel 
simple redundancy bounds in terms of a generalized entropy. A related property involves the existence of optimal 
codes: For "well-behaved" cost functions, optimal codes always exist for (possibly infinite-alphabet) sources having 
finite generalized entropy. Solving finite instances of such problems is done by generalizing an algorithm for finding 
length-limited binary codes to a new algorithm for finding optimal binary codes for any quasiarithmetic mean with a 
convex cost function. This algorithm can be performed using quadratic time and linear space, and can be extended to 
other penalty functions, some of which are solvable with similar space and time complexity, and others of which are 
solvable with slightly greater complexity. This reduces the computational complexity of a problem involving minimum 
delay in a queue, allows combinations of previously considered problems to be optimized, and greatly expands the 
space of problems solvable in quadratic time and linear space. The algorithm can be extended for purposes such as 
breaking ties among possibly different optimal codes, as with bottom-merge Huffman coding. 

Index Terms 

Optimal prefix code, Huffman algorithm, generalized entropies, generalized means, quasiarithmetic means, 
queueing. 
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Source Coding for Quasiarithmetic Penalties 



I. Introduction 

It is well known that Huffman coding [1] yields a 
prefix code minimizing expected length for a known 
finite probability mass function. Less well known are 
the many variants of this algorithm that have been 
proposed for related problems [2]. For example, in his 
doctoral dissertation, Humblet discussed two problems 
in queueing that have nonlinear terms to minimize [3]. 
These problems, and many others, can be reduced to a 
certain family of generalizations of the Huffman problem 
introduced by Campbell in [4]. 

In all such source coding problems, a source emits 
symbols drawn from the alphabet X = {1,2, ... ,n}, 
where n is an integer (or possibly infinity). Symbol i has 
probability pi, thus defining probability mass function p. 
We assume without loss of generality that pi > for 
every i S X, and that pi < pj for every i > j (i, j e X). 
The source symbols are coded into codewords composed 
of symbols of the D-ary alphabet {0,1,..., D — 1}, 
most often the binary alphabet, {0,1}. The codeword 
Ci corresponding to symbol i has length Zj, thus defining 
length distribution I. Finding values for I is sufficient to 
find a corresponding code. 

Huffman coding minimizes J^iexP^i- Campbell's 
formulation adds a continuous (strictly) monotonic in- 
creasing cost function ip(l) : M. + — > R+. The value to 
minimize is then 

\i£X ) 

Campbell called the "mean length for the cost 
function <p"; for brevity, we refer to it, or any value to 



minimize, as the penalty. Penalties of the form Q are 
called quasiarithmetic or quasilinear; we use the former 
term in order to avoid confusion with the more common 
use of the latter term in convex optimization theory. 

Note that such problems can be mathematically de- 
scribed if we make the natural coding constraints ex- 
plicit: the integer constraint, U £ Z+, and the Kraft 
(McMillan) inequality [5], 

k(Q 4 <1. 

iex 

Given these constraints, examples of ip in Q include 
a quadratic cost function useful in minimizing delay due 
to queueing and transmission, 

<p(x) = ax + /3x 2 (2) 

for nonnegative a and (3 [6], and an exponential cost 
function useful in minimizing probability of buffer over- 
flow, ip(x) — D tx for positive t [3], [7]. These and other 
examples are reviewed in the next section. 

Campbell noted certain properties for convex (p, such 
as those examples above, and others for concave ip. 
Strictly concave ip penalize shorter codewords more 
harshly than the linear function and penalize longer 
codewords less harshly. Conversely, strictly convex ip 
penalize longer codewords more harshly than the linear 
function and penalize shorter codewords less harshly. 
Convex ip need not yield convex L, although ip(L) is 
clearly convex if and only if ip is. Note that one can 
map decreasing ip to a corresponding increasing function 
(p(l) — (^ max — tp(l) without changing the value of 
L (e.g., for ip max — <p{0)). Thus the restriction to 
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increasing <p can be trivially relaxed. 

We can generalize L by using a two-argument cost 
function f(l,p) instead of p(l), as in (|3}, and adding 
{00} to its range. We usually choose functions with the 
following property: 

Definition 1: A cost function f(l,p) and its associated 
penalty L are differentially monotonic if, for every I > 1, 
whenever f(l — l,Pi) is finite and > pj, f(l,Pi) — 
f(l-l, Pi )> f(l, Pj )-f(l-l, Pj ). 
This property means that the contribution to the penalty 
of an Zth bit in a codeword will be greater if the 
corresponding event is more likely. Clearly any f(l,p) = 
pip(l) will be differentially monotonic. This restriction 
on the generalization will aid in finding algorithms 
for coding such cost functions, which we denote as 
generalized quasiarithmetic penalties: 

Definition 2: Let f(l,p) : R + x [0, 1] ->• M+ U {00} 
be a function nondecreasing in I. Then 

L(p,lJ)±J2f(k, Pi ). (3) 

iex 

is called a generalized quasiarithmetic penalty. Further, 
if / is convex in I, it is called a generalized quasiarith- 
metic convex penalty. 

As indicated, quasiarithmetic penalties — mapped with 
fusing f(h,pi) =Pi<p(li) to L(p,l,f) = ip(L(p,l,ip)) 
— are differentially monotonic, and thus can be consid- 
ered a special case of differentially monotonic general- 
ized quasiarithmetic penalties. 

In this paper, we seek properties of and algorithms for 
solving problems of this form, occasionally with some 
restrictions (e.g., to convexity of ip). In the next section, 
we provide examples of the problem in question. In 
Section [H]] we investigate Campbell's quasiarithmetic 
penalties, expanding beyond Campbell's properties for 
a certain class of cp that we call subtranslatory . This 
will extend properties — entropy bounds, existence 



of optimal codes — previously known only for linear 
ip and, in the case of entropy bounds, for <p of the 
exponential form p(x) = D tx . These properties pertain 
both to finite and infinite input alphabets, and some 
are applicable beyond subtranslatory penalties. We then 
turn to algorithms for finding an optimal code for finite 
alphabets in Section II VI we start by presenting and 
extending an alternative to code tree notation, nodeset 
notation, originally introduced in [6]. Along with the 
Coin Collector's problem, this notation can aid in solving 
coding problems with generalized quasiarithmetic con- 
vex penalties. We explain, prove, and refine the resulting 
algorithm, which is 0(n 2 ) time and 0(n) space when 
minimizing for a differentially monotonic generalized 
quasiarithmetic penalty; the algorithm can be extended to 
other penalties with a like or slightly greater complexity. 
This is an improvement, for example, on a result of 
Larmore, who in [6] presented an 0(n 3 )-time 0(n 3 )- 
space algorithm for cost function (0 in order to optimize 
a more complicated penalty related to communications 
delay. Our result thus improves overall performance for 
the quadratic problem and offers an efficient solution 
for the more general convex quasiarithmetic problem. 
Conclusions are presented in Section Ivl 

II. Examples 

The additive convex coding problem considered here 
is quite broad. Examples include 

f(lu Pi )= Pi lt &(x)=x a ) 

for a > 1, the moment penalty; see, e.g., [8, pp. 121- 
122]. Although efficient solutions have been given for 
a = 1 (the Huffman case) and a — 2 (the quadratic 
moment), no polynomial-time algorithms have been pro- 
posed for the general case. 
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The quadratic moment was considered by Larmore in 
[6] as a special case of the quadratic problem (0, which 
is perhaps the case of greatest relevance. Restating this 
problem in terms of /, 

f(k,Pi) = Pi(ax + (3x 2 ) (f(x) = ax + (3x 2 ) . 

This was solved with cubic space and time complexity 
as a step in solving a problem related to message delay. 
This larger problem, treated first by Humblet [3] then 
Flores [9], was solved with an 0(n 5 )-time 0(n 3 )-space 
algorithm that can be altered to become an 0(n 4 )-time 
(9(n 2 )-space algorithm using methods in this paper. 

Another quasiarithmetic penalty is the exponential 
penalty, that brought about by the cost function 



f{k,p i )=p i D tl * (<p(x)=D t *) 



(4) 



for t > 0, D being the size of the output alphabet. 
This was previously proposed by Campbell [4] and 
algorithmically solved as an extension of Huffman's 
algorithm (and thus with linear time and space for sorted 
probability inputs) in [3], [7], [10], [11]. As previously 
indicated, in [3], [7] this is a step in minimizing the 
probability of buffer overflow in a queueing system. 
Thus the quasiarithmetic framework includes the two 
queueing-related source coding problems discussed in 
[3]. 

A related problem is that with the concave cost 
function 

f(l i ,p i )=p i (l-D tl <) (<p(x) =l-D**) 

for t < 0, which has a similar solution [7]. This problem 
relates to a problem in [12] which is based on a scenario 
presented by Renyi in [13]. 

Whereas all of the above, being continuous in U and 
linear in pi, are within the class of cases considered by 
Campbell, the following convex problem is not, in that 
its range includes infinity. Suppose we want the best code 



f(k,Pi) = 



(5) 



possible with the constraint that all codes must fit into a 
structure with / max symbols. If our measure of the "best 
code" is linear, then the appropriate penalty is 

Pi^i? li — ^max 
OO, li /max 

for some fixed Z max > |~log D n] . This describes the 
length-limited linear penalty, algorithmically solved effi- 
ciently using the Package-Merge algorithm in [14] (with 
the assumption that D = 2). This approach will be a 
special case of our coding algorithm. 

Note that if the measure of a "best code" is nonlinear, 
a combination of penalties should be used where length 
is limited. For example, if we wish to minimize the 
probability of buffer overflow in a queueing system 
with a limited length constraint, we should combine @ 
and Q: 



f(h,Pi) = 



P 



±j j i l tmax 



(6) 



This problem can be solved via dynamic programming 
in a manner similar to [15], but this approach takes 
f2(n 2 / max ) time and £l(n 2 ) space for D — 2 and greater 
complexity for D > 2 [16]. Our approach improves on 
this considerably. 

In addition to the above problems with previously 
known applications — and penalties which result from 
combining these problems — one might want to solve 
for a different utility function in order to find a com- 
promise among these applications or another trade-off 
of codeword lengths. These functions need not be like 
Campbell's in that they need not be linear in p; for 
example, consider 

f(k,Pi) = (l-Pi)- li . 

Although the author knows of no use for this particular 
cost function, it is notable as corresponding to one of 
the simplest convex-cost penalties of the form (0. 
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adding a fifth definition: 

L* ± 



A. Bounds and the Subtranslatory Property 



Campbell's quasiarithmetic penalty formulation can be 
restated as follows: 



Given p = (p x , . . . ,p„), p; > 0, 

EiP* = !; 

convex, monotonically increasing 

ip : R + -> R + 
Minimize {i} £(p, Z, = £\ Pi<p(/j) 
subject to £V 2~ Zi < 1; 

e z+ 

(7) 

In the case of linear ip, the integer constraint is often 
removed to obtain bounds related to entropy, as we do 
in the nonlinear case: 



Given P = (pi, • • ■ ,Pn), Pi > 0, 

Y,iPi = !; 

convex, monotonically increasing 

ip : R+ -> M+ 
Minimize {J} L(p,l,(p) = YriPWik) 
subject to £\ < 1 

(8) 

Note that, given p and <p, L\ the minimum for the 
relaxed (real-valued) problem l|8}, will necessarily be 
less than or equal to L* , the minimum for the orig- 
inal (integer-constrained) problem 0. Let V and I* 
be corresponding minimizing values for the relaxed 
and constrained problems, respectively. Restating, and 



min L(p,l,ip) 

£j <1, 



I* = arg min L(p, I, ip) 

Ei D-'i <1, 



min L(p,l,(f) 

Ei D~'i <1, 

r = arg min L(p, I, ip) 

Ei <1, 

i* ^ (r4i,r4i,---J^i) 



This is a slight abuse of arg min notation since L* could 
have multiple corresponding optimal length distributions 
(I*). However, this is not a problem, as any such value 
will suffice. Note too that ft satisfies the Kraft inequality 
and the integer constraint, and thus L(p, I ,<p) > L*. 

We obtain bounds for the optimal solution by noting 
that, since ip is monotonically increasing, 

v~ l fepMil)) < v- 1 (EiPMi*)) 

< v- 1 fapMil)) 

< v- 1 (EiPMit + 1)) • 

(9) 

These bounds are similar to Shannon redundancy 
bounds for Huffman coding. In the linear/Shannon case, 
l\ = — log 2 Pi, so the last expression is YliPiiK + 1) = 
1 + YliPin = 1 + H(p)> wnere H(p) is the Shannon 
entropy, so H(p) < Y^iPdt < l+-^(p)- These Shannon 
bounds can be extended to quasiarithmetic problems by 
first defining ^-entropy as follows: 

Definition 3: Generalized entropy or ip-entropy is 

ff(p,¥>)= inf L(p,l,<p) (10) 

Ei D-'i <1, 

hem 

where here infimum is used because this definition 
applies to codes with infinite, as well as finite, input 
alphabets [4]. 

Campbell defined this as a generalized entropy [4]; 
we go further, by asking which cost functions, ip, have 
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the following property: 

H(p,tp)<L(p,l*,tp)<l + H(p,tp) (11) 

These bounds exist for the exponential case (0} with 
H(p,tp) = H a (p), where a = (1 + f) _1 , and H a (p) 
denotes Renyi a-entropy [17]. The bounds extend to 
exponential costs because they share with the linear 
costs (and only those costs) a property known as the 
translatory property, described by Aczel [18], among 
others: 

Definition 4: A cost function tp (and its associated 
penalty) is translatory if, for any I £ R", probability 
mass function p, and c £ R+, 

L(p, I + c, tp) = L(p, I, ip) + c 

where I + c denotes adding c to each U in I [18]. 

We broaden the collection of penalty functions satis- 
fying such bounds by replacing the translatory equality 
with an inequality, introducing the concept of a subtrans- 
latory penalty: 

Definition 5: A cost function tp (and its associated 
penalty) is subtranslatory if, for any I £ M™ , probability 
mass function p, and c £ R + , 

L(p, I + c, tp) < L(p, I, tp) + c. 
For such a penalty, il Q still holds. 

If tp obeys certain regularity requirements, then we 
can introduce a necessary and sufficient condition for it 
to be subtranslatory. Suppose that the invertible function 
ip : M + — + M + is real analytic over a relevant compact 
interval. We might choose this interval to be, for exam- 
ple, A = [S,l/5] for some S £ (0, 1). (Let 5 — > to 
show the following argument is valid over all K+.) We 
assume ip^ 1 is also real analytic (with respect to interval 
<p(A)). Thus all derivatives of the function and its inverse 
are bounded. 



Theorem 1: Given real analytic cost function tp and 
its real analytic inverse tp^ 1 , tp is subtranslatory if and 
only if, for all positive I and all positive p summing to 1, 

Y,PiV'{h) < V' U' 1 ^PMk^j ^ (12) 

where ip' is the derivative of (p. 

Proof: First note that, since all values are positive, 
inequality d!2i is equivalent to 

n>2pi<f/(k?J • (^7 (e>^*)) ^ L < 13 ) 

We show that, when (II 31 is true everywhere, tp is 
subtranslatory, and then we show the converse. Let e > 
0. Using power expansions of the form 

g(x)+eg'(x) = g(x + e) ± 0(e 2 ) 

on tp and tp^ 1 , 

1 fejw(i<)J 

= ( + e • E^ft) J ± °^ 

\ i i / 

= ^ (^EpMh + e)^ ±0(e 2 ). 

(14) 

Step (a) is due to dl3> , step (b) due to the power 
expansion on tp^ 1 , step (c) due to the power expansion 
on tp, and step (d) due to the power expansion on tp -1 
(where the bounded derivative of ip^ 1 allows for the 
asymptotic term to be brought outside the function). 



■•P 



(a) _ 
> V 



+ e 
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Next, evoke the above inequality c/e times: which, taking e — > 0, similarly leads to 

< . +'„-' (s>ft + ■=-«))* O(^) ^ » + + (?"*<« 



< • 

< e 



V 1 I X^C*) + c-e - J ± 0(e) 

and thus the subtranslatory property fails and the con- 
verse is proved. ■ 
Therefore, for ip satisfying d!2l >. we have the bounds 
of (1111 for the optimum solution. Note that the right- 



(15) 



^' hand side of (I12t may also be written 93' (L(p,l, (p)); 



Thus, the fact of d!2t is sufficient to know that the 



thus (II 21 indicates that the average derivative of ip at 
the codeword length values is at most the derivative of 
ip at the value of the penalty for those length values. 
The linear and exponential penalties satisfy these 



penalty is subtranslatory. 

equivalent inequalities with equality. Another family of 

To prove the converse, suppose V\. piip' (L) > r 

1 cost functions that satisfies the subtranslatory property 

ip' (oj _1 (Y] Piipih))) for some valid I and p. Because ,, , ,„ „ , . , 

y \y in) r is ^(/i) =/? for fixed a > 1, which corresponds to 

ip is analytic, continuity implies that there exist Sq > l , 

and e > such that L(p,l,<p)= (^Pilfj 

E/n'\ ^ n , s \ 1 I -11 n>\ I \ Proving this involves noting that Lyapunov's inequality 

PiP (h) > (1 + °0) ■ I f I l^PMk) 

i \ \ i II for moments of a random variable yields 



for all Z' e [I, I + eo). The chain of inequalities above 
reverse in this range with the additional multiplicative 
constant. Thus ([14} becomes which leads to 



1-1 



Em- 1 < Em- 



which, because (p'(x) — as"" 1 , is 



<^ [J2pMll + e)j ±0(e 2 ) 
for Z' € [Z, Z + eo), and ( 1151 becomes, for any c £ (0, eo), 



^Pi^'Gi) < f' [ip 1 



the inequality we desire. 



<P 



-l I 'y~^p.y ) (l. + c) ) Another subtranslatory penalty is the quadratic 

quasiarithmetic penalty of in which 



> (l + 5 )c + <p- 



1 ^>p(ii)^±0(e) 



(p(x) = ax + /3x 2 
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for a, > 0. This has already been shown for = 0; 
when > 0, 



L{p,l,<p) = 



a + 2 fix 



a 

w 



x a 
~0~T0 



a 

20' 



We achieve the desired inequality through algebra: 

a 2 +4pY^Pi(ali+0l?) > ^Pi(a + 2/%)J 
/« 2 +4/3^ Pl (^+^ 2 ) > ^ Pi ( a + 2A) 

V \L(p,l, V )) > Y^PiffQi) 

i 

We thus have an important property that holds for several 
cases of interest. 

One might be tempted to conclude that every ip — 
or every convex and/or concave ip — is subtransla- 
tory. However, this is easily disproved. Consider convex 
<p(x) 



the meaning of an "optimal code" when there are an 
infinite number of possible codes satisfying the Kraft 
inequality with equality. Must there exist an optimal 
code, or can there be an infinite sequence of codes of 
decreasing penalty without a code achieving the limit 
penalty value? 

Fortunately, the answer is the former, as the existence 
results of Linder, Tarokh, and Zeger in [19] can be ex- 
tended to quasiarithmetic penalties. Consider continuous 
strictly monotonic ip : M. + — > R + (as proposed by 
Campbell) and p = (pi,P2, ■■ •) such that 



11a;. Using Cardano's formula, it is 
easily seen that d!2i does not hold for p = (|, |) 
and I = (5,1)- The subtranslatory test also fails for 
ip(x) — \fx. Thus we must test any given penalty for the 
subtranslatory property in order to use the redundancy 
bounds. 

B. Existence of an Optimal Code 

Because all costs are positive, the redundancy bounds 
that are a result of a subtranslatory penalty extend 
to infinite alphabet codes in a straightforward manner. 
These bounds thus show that a code with finite penalty 
exists if and only if the generalized entropy is finite, a 
property we extend to nonsubtranslatory penalties in the 
next subsection. However, one must be careful regarding 



L*{p,v)= inf ip 1 l^pMk 



(16) 



is finite. Consider, for an arbitrary n E Z + , optimizing 
for ip with weights 

P [n) =(pi,P2,...,Pn,0,0,...). 

(We call the entries to this distribution "weights" because 
they do not necessarily add up to 1.) Denote the optimal 
code a truncated code, one with codeword lengths 

iWi{lW,!W...,lW l0 o,oo,...}. 

Thus, for convenience, — 00 for i > j. These lengths 
are also optimal for (X^i-Pj) 1 'P i me distribution 
of normalized weights. 

Following [19], we say that a sequence of codeword 
length distributions l^ x \ l^ 2 \ . . . converges to an in- 
finite prefix code with codeword lengths I — {h,h, ■ ■ ■} 
if, for each i, the zth length in each distribution in the 
sequence is eventually li (i.e., if each sequence converges 

to li). 

Theorem 2: Given quasiarithmetic increasing tp and p 
such that L*(p,(p) is finite, the following hold: 

1) There exists a sequence of truncated codeword 
lengths that converges to optimal codeword lengths 
for p; thus the infimum is achievable. 
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2) Any optimal code for p must satisfy the Kraft 
inequality with equality. 
Proof: Because here we are concerned only with 
cases in which the first length is at least 1, we may re- 
strict ourselves to the domain [ip~ 1 (piip(l)), oo). Recall 

L *(P, <P)= inf ¥> _1 [y^PMk) < oo. 
i.ez N4_ L ' 

Then there exists near-optimal l' — 1' 2 , 1' 3 , . . .} G 
Z~ such that 

(OO \ OO 

J>¥>(0 <L*(p,<p) + l and J^D-'i < 1, 
i=l / i=l 

and thus, for any integer n, 

(n \ n 

<L*(p,^) + l and ^IT'i < 1. 
i=i / i=i 

So, using this to approximate the behavior of a mini- 
mizing l^ n \ we have 

< L*(p )¥ >) + 1 
yielding an upper bound on terms 

n 

i=i 

< p(L*(p,p) + l) 

for all j. This implies 

j(n) < i / + 

Thus, for any i e Z+, the sequence l\ 2 \ l\ 3 \ . . . 
(i) 

is bounded for all l\ ^ oo, and thus has a finite set of 
values (including oo). It is shown in [19] that this suffi- 
cies for the desired convergence, but for completeness a 
slightly altered proof follows. 

Because each sequence lf\lf\lf\... has a finite 
set of values, every infinite indexed subsequence for 
a given i has a convergent subsequence. An inductive 
argument implies that, for any k, there exists a sub- 

, (n k ) (n k ) (n k ) 

sequence indexed by rq such that l\ , Z- 3 ,.. . 



(n k ) (n k ) (n k ) 

converges for all i < k, where lp', Ip', ir 3 ', . . . 
is a subsequence of I -™ 1 •* , l^f" 2 ^ , Z t -™ 3 \ ... for k' < 
k. Codeword length distributions i (n ^, l {n ^\l {n ^\ . . . 
(which we call l {ni \ l {n2 \ l {n3 \ . . .) thus converge to 
the codeword lengths of an infinite code C with code- 
word lengths I = {li, l2, h, ■ ■ ■}■ Clearly each codeword 
length distribution satisfies the Kraft inequality. The limit 
does as well then; were it exceeded, we could find i' such 
that 

5>- r <>i 



»=i 



and thus n' such that 



> i 



causing a contradiction. 

We now show that C is optimal. Let {Ai, A2, A3 . . .} 
be the codeword lengths of an arbitrary prefix code. For 
every k, there is a j > k such that = ^™ m ) for any 
i < k if m > j. Due to the optimality of each l^ n \ for 
all m > j: 

k k 

i=l i=l 

1=1 

00 

< ^2,pw{\) 

i=l 

and, taking k -> 00, J2iPif( l i) < Z)»fW(^*)> leading 
directly to y^ 1 (X)»P*V>('*)) < and 
the optimality of C. 

Suppose the Kraft inequality is not satisfied with 
equality for optimal codeword lengths i = {Zi, Z2, - - -}- 
We can then produce a strictly superior code. There 
is a k e Z+ such that D- lk+1 + Y,t D ~ h < 1 - 
Consider code {h,h, ■ ■ ■ , h-i,h - 1, h+iJk+2, ■ ■ ■}■ 
This code satisfies the Kraft inequality and has 
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< 



penalty <p 1 ( + Pk(<fQk ~ 1) - ¥>{h)) 
ip^ 1 (^iPi^pihfj ■ Thus I is not optimal. Therefore 
the Kraft inequality must be satisfied with equality for 
optimal infinite codes. ■ 
Note that this theorem holds not just for subtranslatory 
penalties, but for any quasiarithmetic penalty. 

C. Finiteness of Penalty for an Optimal Code 
Recall the definition of (fTOI . 

H(p,ip)= inf cp' 1 [y^PiVik)) 



for ip : M+ -> R+. 

Theorem 3: If H(p, ip) is finite and either ip is sub- 
translatory or ip(x + 1) = 0(p(x)) (which includes all 
concave and all polynomial (p), then the coding problem 
of d, 

L*(p,<p)= inf (f- 1 \^2pMli)) 
hei, x ' 

has a minimizing I* resulting in a finite value for 

L*(p,<p). 

Proof: If ip is subtranslatory, then L* (p, ip) < 1 + 
H(p,tp) < oo. If + 1) = 0(y>(x)), then there are 
a, (3 > such that i/j(a; + 1) < a + Pf>{x) for all x. 
Then 

1 [a + dY^pMh^j ■ 



So 



L*(p, <p) 

<L{p,tf + l,cp) 
Kip-^a + ptpiHip, ip))) 
< oo 

and the infimum, which we know to also be a minimum, 
is finite. ■ 



IV. Algorithms 
A. Nodeset Notation 

We now examine algorithms for finding minimum 
penalty codes for convex cases with finite alphabets. We 
first present a notation for codes based on an approach 
of Larmore [6], This notation is an alternative to the 
well known code tree notation, e.g., [20], and it will 
be the basis for an algorithm to solve the generalized 
quasiarithmetic (and thus Campbell's quasiarithmetic) 
convex coding problem. 

In the literature nodeset notation is generally used 
for binary alphabets, not for general alphabet coding. 
Although we briefly sketch how to adapt this technique 
to general output alphabet coding at the end of Subsec- 
tion |^^| an approach fully explained in [21], until then 
we concentrate on the binary case (D = 2). 

The key idea: Each node (i, I) represents both the 
share of the penalty L(p,l,f) (weight) and the share 
of the Kraft sum k(1) (width) assumed for the Zth bit 
of the ith codeword. If we show that total weight is an 
increasing function of the penalty and show a one-to-one 
correspondence between optimal nodesets and optimal 
codes, we can reduce the problem to an efficiently 
solvable problem, the Coin Collector's problem. 

In order to do this, we first assume bounds on the 
maximum codeword length of possible solutions, e.g., 
the maximum unary codeword length of n — 1 . Alterna- 
tively, bounds might be explicit in the definition of the 
problem. Consider for example the length-limited coding 
problems of (|3J and (jfji, upper bounded by l max . A third 
possibility is that maximum length may be implicit in 
some property of the set of optimal solutions [22]-[24]; 
we explore this in Subsection IIV-EI 

We therefore restrict ourselves to codes with n code- 
words, none of which has greater length than l max , where 
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Imax £ [ri°g2 n\,n— 1], With this we now introduce the 
node set notation for binary coding: 

Definition 6: A node is an ordered pair of integers 
(i, I) such that i £ {1, . . . , n} and I E {1, . . . , / max }. 
Call the set of all nl max possible nodes /. Usually / 
is arranged in a grid; see example in Fig. [0 The set 
of nodes, or nodeset, corresponding to item i (assigned 
codeword a with length k) is the set of the first U 
nodes of column i, that is, T)i(i) = | j = 

i, I G {1, . . . , li}}. The nodeset corresponding to length 
distribution I is rj(l) — (L rji(i); this corresponds to a set 
of n codewords, a code. We say a node (i, I) has width 
p(i,l) = 2~ l and weight p(i, I) = f(l,Pi)-f(l-l,Pi), 
as in the example in Fig. 

If / has a subset N that is a valid nodeset, then 
it is straightforward to find the corresponding length 
distribution and thus a code. We can find an optimal 
valid nodeset using the Coin Collector's problem. 

B. The Coin Collector's Problem 

Let 2 Z denote the set of all integer powers of two. The 
Coin Collector's problem of size m considers m "coins" 
with width pi £ 2 Z ; one can think of width as coin 
face value, e.g., Pi — \ for a quarter dollar (25 cents). 
Each coin also has weight pi S K. The final problem 
parameter is total width, denoted t. The problem is then: 

Minimize {S c{i,...,m}} E ieB Mi 

subject to SiGB Pi = t 

We thus wish to choose coins with total width t such 
that their total weight is as small as possible. This 
problem is an input-restricted variant of the knapsack 
problem, which, in general, is NP-hard; no polynomial- 
time algorithms are known for such NP-hard problems 
[25], [26]. However, given sorted inputs, a linear-time 



solution to dl7l was proposed in [14]. The algorithm in 
question is called the Package-Merge algorithm. 

In the Appendix, we illustrate and prove a slightly 
simplified version of the Package-Merge algorithm. This 
algorithm allows us to solve the generalized quasiarith- 
metic convex coding problem (|3j- When we use this al- 
gorithm, we let X represent the m items along with their 
weights and widths. The optimal solution to the problem 
is a function of total width t and items X. We denote this 
solution as CC(X, t) (read, "the [optimal] coin collection 
for X and t"). Note that, due to ties, this need not be 
unique, but we assume that one of the optimal solutions 
is chosen; at the end of Subsection IIV-DI we discuss 
which of the optimal solutions is best to choose. 

C. A General Algorithm 

We now formalize the reduction from the generalized 
quasiarithmetic convex coding problem to the Coin Col- 
lector's problem. 

We assert that any optimal solution N of the Coin 
Collector's problem for t = n — 1 on coins X = I is a 
nodeset for an optimal solution of the coding problem. 
This yields a suitable method for solving generalized 
quasiarithmetic convex penalties. 

To show this reduction, first define p(N) for any N — 
V (l): 

p(N) 4 £ p (i,l) 
(id)eN 

n h 

- EE 2_i 

i=x i=i 

n 

= El 1 - 2 " 1 ) 

i=X 

n 

= «-E 2 ^ 

i=l 

= n — k(1) 

Because the Kraft inequality is k(1) < 1, p(N) must 



February 1, 2008 



DRAFT 



IEEE TRANSACTIONS ON INFORMATION THEORY 



11 




lie in [n — l,n) for prefix codes. The Kraft inequality 
is satisfied with equality at the left end of this interval. 
Optimal binary codes have this equality satisfied, since 
a strict inequality implies that the longest codeword 
length can be shortened by one, strictly decreasing the 
penalty without violating the inequality. Thus the optimal 
solution has p(N) = n — 1. 

Also define: 



M f (l,p) 4 f(l,p)-f(l-l,p) 

n 

Lo(p,f) = £/(0,Pi) 

i=l 

(*,0eJV 
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Note that 

i=l i=l 

n n 

i=l i=l 

= L(p,l,f)-L Q (p,f). 

Lo(p, f) is a constant given fixed penalty and probability 
distribution. Thus, if the optimal nodeset corresponds to 
a valid code, solving the Coin Collector's problem solves 
this coding problem. To prove the reduction, we need to 
prove that the optimal nodeset indeed corresponds to a 
valid code. We begin with the following lemma: 

Lemma 1: Suppose that N is a nodeset of width 
x2~ k + r where k and x are integers and < r < 2~ fe . 
Then N has a subset R with width r. 

Proof: We use induction on the cardinality of the 
set. The base case \N\ = 1 is trivial since then x = 0. 
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Assume the lemma holds for all \N\ < n, and suppose 
\N\ = n. Let p* = min^ pj and j* = arg min je ^ p r 
We can see p* as the smallest contribution to the width 
of N and r as the portion of the binary expansion of 
the width of TV to the right of 2~ fe . Then clearly r must 
be an integer multiple of p* . If r = p*, R = {j*} is a 
solution. Otherwise let N' = N\{j*} (so |jV'| = n- 1) 
and let R' be the subset obtained from solving the lemma 
for set N' of width r - p*. Then R = R' U {j*}. ■ 
We are now able to prove the main theorem: 
Theorem 4: Any N that is a solution of the Coin 
Collector's problem for t — p(N) = n — 1 has a 
corresponding l N such that N = rj(l N ) and p(N) — 
minj L(p,l,f) - L (p, /). 

Proof: By monotonicity of the penalty function, 
any optimal solution satisfies the Kraft inequality with 
equality. Thus all optimal length distribution nodesets 
have p(r](l)) = n — 1. Suppose N is a solution to the 
Coin Collector's problem but is not a valid nodeset of a 
length distribution. Then there exists an (i, I) with I > 1 
such that (i, I) E N and (i, I - 1) € J\iV. Let i?' = 
iVU{(«,Z-l)}\{(«,Z)}. Then p(R') = n-1 + 2"' and, 
due to convexity, p(R') < p(N). Thus, using Lemma^ 
with fc = 0, x = n — 1, and r = 2~ l , there exists an 
R c R' such that = 2~ l and p(R'\R) < p{R') < 
p(N). Since we assumed N to be an optimal solution 
of the Coin Collector's problem, this is a contradiction, 
and thus any optimal solution of the Coin Collector's 
problem corresponds to an optimal length distribution. 

■ 

Note that the generality of this algorithm makes it 
trivially extensible to problems of the form J^. fi(h,Pi) 
for n different functions /j. This might be applicable 
if we desire a nonlinear weighting for codewords — 
such as an additional utility weight — in addition to and 
possibly independent of codeword length and probability. 



Because the Coin Collector's problem is linear in time 
and space, the overall algorithm finds an optimal code 
in 0(nZ max ) time and space for any "well-behaved" 
f(U,Pi), that is, any / of the form specified for which 
same-width inputs would automatically be presorted by 
weight for the Coin Collector's problem. 

The complexity of the algorithm in terms of n alone 
depends on the structure of both / and p, because, if 
we can upper-bound the maximum length codeword, we 
can run the Package-Merge algorithm with fewer input 
nodes. In addition, if / is not "well-behaved," input to 
the Package-Merge algorithm might need to be sorted. 

To quantify these behaviors, we introduce one defini- 
tion and recall another: 

Definition 7: A (coding) problem space is called aflat 
class if there exists a constant upper bound u such that 
" aXi li < u for any solution I. 

log n J 

For example, the space of linear Huffman coding prob- 
lems with all pi > ^ is a flat class. (This may be shown 
using [23].) 

Recall Definition H given in Section|U A cost function 
f(l,p) and its associated penalty L are differentially 
monotonic or d.m. if, for every I > 1, whenever f(l — 
I, Pi) is finite and p { > p j} f(l,pi) - f(l - l,Pi) > 
f{l,Pj) — /(£— LPj)- This implies that / is continuous 
in p at all but a countable number of points. Without 
loss of generality, we consider only cases in which it is 
continuous everywhere. 

If f(l,p) is differentially monotonic, then there is no 
need to sort the input nodes for the algorithm. Otherwise, 
sorting occurs on Z max rows with O(nlogn) on each 
row, 0(nZ max logn) total. Also, if the problem space is 
a flat class, Z max is O(logn); it is 0{n) in general. Thus 
time complexity for this solution ranges from 0{n log n) 
to 0(n 2 log n) with space requirement 0{n log n) to 
0(n 2 ); see Table HI for details. As indicated in the 
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problem type 


time 


space 


flat, d.m. 


0(n log n) 


0(n log n) 


space-optimized 


0(n log n) 


0(n) 


not flat, d.m. 


0(n 2 ) 


0(n 2 ) 


space-optimized 


0(n 2 ) 


O(n) 


flat, not d.m. 


0(n log 2 n) 


0(n log n) 


not flat, not d.m. 


0(n 2 logn) 


0(n 2 ) 



TABLE I 

Complexity for various types of inputs 
(d.m. = differentially monotonic) 



table, space complexity can be reduced in differentially 
monotonic instances. 

D. A Linear-Space Algorithm 

Note that the length distribution returned by the algo- 
rithm need not have the property that U < lj whenever 
i < j. For example, if pt — pj, we are guaranteed no 
particular inequality relation between U and lj since we 
did not specify a method for breaking ties. Also, even if 
all pi were distinct, there are cost functions for which we 
would expect the inequality relation reversed from the 
linear case. An example of this is f(h,Pi) = p~ 1 2 li , 
although this represents no practical problem that the 
author is aware of. 

Practical cost functions will, given a probability distri- 
bution for nonincreasing p^ generally have at least one 
optimal code of monotonically nondecreasing length. 
Differentially monotonicity is a sufficient condition for 
this, and we can improve upon the algorithm by insisting 
that the problem be differentially monotonic and all 
entries pi in p be distinct; the latter condition we later 
relax. The resulting algorithm uses only linear space and 
quadratic time. First we need a definition: 

Definition 8: A monotonic nodeset, N, is one with 



13 

the following properties: 

E N => (i + 1,1) e N for i<n (18) 
e N=> 1) € N forZ>l (19) 

This definition is equivalent to that given in [14]. 

An example of a monotonic nodeset is the set of 
nodes enclosed by the dashed line in Fig. |2] Note that a 
nodeset is monotonic only if it corresponds to a length 
distribution I with lengths sorted in nondecreasing order. 

Lemma 2: If a problem is differentially monotonic 
and monotonically increasing and convex in each U, and 
if p has no repeated values, then any optimal solution 
N = CC(I,n — 1) is monotonic. 

Proof: The second monotonic property dl9l was 
proved for optimal nodesets in Theorem^ and the first is 
now proved with a simple exchange argument, as in [27, 
pp. 97-98]. Suppose we have optimal N that violates the 
first property Jl 8I >. Then there exist unequal i and j such 
that pi < pj and U < lj for optimal codeword lengths I 
(N = rj(l)). Consider I 1 with lengths for symbols i and 
j interchanged. Then 

L(p,l',f)-L(p,l,f) 

= Efc f( l 'kiPk) - Efc f(h,Pk) 

= u(h,Pj) - m,Pi)) - imp*) - f(h,Pi)) 

= E'U+i (M f (l, Pj ) - M f (l, Pi )) 
< 

where we recall that M f (l,p) = f(l,p)-f(l-l,p) and 
the final inequality is due to differential monotonicity. 
However, this implies that I is not an optimal code, 
and thus we cannot have an optimal nodeset without 
monotonicity unless values in p are repeated. ■ 
Taking advantage of this relation to trade off a constant 
factor of time for drastically reduced space complexity 
has been done in [6] for the case of the length-limited 
(linear) penalty (|5}. We now extend this to all convex 
differentially monotonic cases. 
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Note that the total width of items that are each less 
than or equal to width p is less than 2np. Thus, when 
we are processing items and packages of width p, fewer 
than 2n packages are kept in memory. The key idea in 
reducing space complexity is to keep only four attributes 
of each package in memory instead of the full contents. 
In this manner, we use linear space while retaining 
enough information to reconstruct the optimal nodeset 
in algorithmic postprocessing. 

Define Z mic j = LsC^max + !)_!■ Package attributes allow 
us to divide the problem into two subproblems with 
total complexity that is at most half that of the original 
problem. For each package S, we retain the following 
attributes: 

1) Weight: /*(S)=£ W) 6sMM) 

2) Width: p(£)=E(i,i)eS MO 

3) Midct: u(S) = \SnI mid \ 

4) Hiwidth: i>(S)±E (i)l)eSnIhi p(i,l) 

where I hi = \ I > l mid } and I mid = | / = 

^mid}- We also define 7 lo = {(i,l) \ I < l mid }. 

This retains enough information to complete the "first 
run" of the algorithm with 0(n) space. The result will be 
the package attributes for the optimal nodeset N. Thus, 
at the end of this first run, we know the value for m = 
v{N), and we can consider N as the disjoint union of 
four sets, shown in Fig. |2] 

1) A = nodes in N n I\ with indices in [1, n — m], 

2) B = nodes in NP\I\ with indices in [n— m+l,n], 

3) C = nodes in N n imid, 

4) D = nodes in N n hi- 

Due to monotonicity of N, it is trivial that C = [n—m+ 
1, n] x {Z mi d} and B = [n - m + 1, n] x [1, l mid - 1]. 
Note then that p(C) = m2 _, ° id and p(B) = m[l - 
2 -(W-i)]. Thus we need merely to recompute which 
nodes are in A and in D. 
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Because D is a subset of 7 m , p(D) — tp(N) and 
p{A) = p{N) ~ p{B) - p{C) - p{D). Given their 
respective widths, A is a minimal weight subset of 
[l,n — to] x [l,/ m id — 1] and D is a minimal weight 
subset of [n — m + 1, n] x [l m [ d + 1, l max \. The nodes 
at each level of A and D may be found by recursive 
calls to the algorithm. In doing so, we use only 0(n) 
space. Time complexity, however, remains the same; we 
replace one run of an algorithm on nl max nodes with 
a series of runs, first one on nl max nodes, then two on 
an average of at most |nZ max nodes each, then four on 
^nI mM , and so forth. Formalizing this analysis: 

Theorem 5: The above recursive algorithm for gener- 
alized quasiarithmetic convex coding has 0(n/ max ) time 
complexity. [14] 

Proof: As indicated, this recurrence relation is 
considered and proved in [14, pp. 472^-73], but we 
analyze it here for completeness. To find the time 
complexity, set up the following recurrence relation: Let 
T(n, I) be the worst case time to find the minimal weight 
subset of [1, n] x [1, f] (of a given width), assuming the 
subset is monotonic. Then there exist constants c\ and 
C2 such that, if we define I = l m i d — 1 < [4 J and 
I = I — I — 1 < [^J, and we let an adversary choose 
the corresponding h + h = n, 

T(n, I) < cm for I < 3 

T(n, I) < c 2 nl + T(h, 1) + T(n, 1) for I > 3, 

where / < 3 is the base case. Then T(n, I) — 0(r(n, I)), 
where r is any function satisfying the recurrence 

r(n,l) > Cin for I < 3 

T~{n,l) > c 2 nl + r(n, 5) + r(n — n, |) for I > 3, 

which t(ji, I) = (ci + 2c2)nl does. Thus, the time 
complexity is 0(nl max ). ■ 
The overall complexity is 0(n) space and 0(nl ma , x ) 
time — 0(n log n) considering only flat classes, 0(n 2 ) 
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/ (level) 
1 



C ' 

D 



p (width) 

2- 1 



1 n — m 



i (item) 



Fig. 2. The set of nodes I, an optimal nodeset N, and disjoint subsets A, B, C, D 



in general, as in Table [I] 

However, the assumption of distinct p^'s puts an unde- 
sirable restriction on our input. In their original algorithm 
from [14], Larmore and Hirschberg suggest modifying 
the probabilities slightly to make them distinct, but this 
is unnecessarily inelegant, as the resulting algorithm has 
the drawbacks of possibly being slightly nonoptimal and 
being nondeterministic; that is, different implementations 
of the algorithm could result in the same input yielding 
different outputs. A deterministic variant of this approach 
could involve modifications by multiples of a suitably 
small variable e > to make identical values distinct. 
In [28], another method of tie-breaking is presented 
for alphabetic length-limited codes. Here, we present a 
simpler alternative analogous to this approach, one which 
is both deterministic and applicable to all differentially 
mono tonic instances. 

Recall that p is a nonincreasing vector. Thus items of 
a given width are sorted for use in the Package-Merge 
algorithm; use this order for ties. For example, if we 
use the nodes in Fig. \l\ — n — 4, f(l,p) — pi 2 — with 
probability p = (0.5, 0.2, 0.2, 0.1), then nodes (4, 3) and 
(3, 3) are the first to be paired, the tie between (2, 3) and 



(3, 3) broken by order. Thus, at any step, all identical- 
width items in one package have adjacent indices. Recall 
that packages of items will be either in the final nodeset 
or absent from it as a whole. This scheme then prevents 
any of the nonmonotonicity that identical p^s might 
bring about. 

In order to ensure that the algorithm is fully determin- 
istic — whether or not the linear-space version is used 
— the manner in which packages and single items are 
merged must also be taken into account. We choose to 
merge nonmerged items before merged items in the case 
of ties, in a similar manner to the two-queue bottom- 
merge method of Huffman coding [20], [29]. Thus, in our 
example, the node (1, 2) is chosen whereas the package 
of items (4, 3) and (3, 3) is not. This leads to the optimal 
length vector I = (2, 2, 2, 2), rather than I = (1, 2, 3, 3) 
or I = (1, 3, 2, 3), which are also optimal. As in bottom- 
merge Huffman coding, the code with the minimum 
reverse lexicographical order among optimal codes is the 
one produced. This is also the case if we use the position 
of the "last" node in a package (in terms of the value 
of nl + i) in order to choose those with lower values, 
as in [28]. However, the above approach, which is easily 
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shown to be equivalent via induction, eliminates the need 
for keeping track of the maximum value of nl + i for 
each package. 

E. Further Refinements 

In this case using a bottom-merge-like coding method 
has an additional benefit: We no longer need assume 
that all pi 7^ to assure that the nodeset is a valid 
code. In finding optimal binary codes, of course, it is 
best to ignore an item with pi = 0. However, consider 
nonbinary output alphabets, that is, D > 2. As in Huff- 
man coding for such alphabets, we must add "dummy" 
values of pi = to assure that the optimal code has the 
Kraft inequality satisfied with equality, an assumption 
underlying both the Huffman algorithm and ours. The 
number of dummy values needed is mod(D — n, D — 1) 
where mod(x,y) = x — 2/LfJ an d wnere the dummy 
values each consist of i max nodes, each node with the 
proper width and with weight 0. With this preprocessing 
step, finding an optimal code should proceed similarly 
to the binary case, with adjustments made for both 
the Package-Merge algorithm and the overall coding 
algorithm due to the formulation of the Kraft inequality 
and maximum length. A complete algorithm is available, 
with proof of correctness, in [21]. 

Note that we have assumed for all variations of this 
algorithm that we knew a maximum bound for length, 
although in the overall complexity analysis for binary 
coding we assumed this was n — 1 (except for flat 
classes). We now explore a method for finding better 
upper bounds and thus a more efficient algorithm. First 
we present a definition due to Larmore: 

Definition 9: Consider penalty functions / and g. 
We say that g is flatter than / if, for probabilities p 
and p' and positive integers I and I' where I' > I, 
M g (l,p)M f {l',p') < M f (l,p)M g (l',p') (where, again, 



M f (l,p)±f(l,p)-f(l-l,p)) [6]. 

A consequence of the Convex Hull Theorem of [6] 
is that, given g flatter than /, for any p, there exist /- 
optimal and g-optimal such that is greater 
lexicographically than (again, with lengths sorted 
largest to smallest). This explains why the word "flatter" 
is used. 

Thus, for penalties flatter than the linear penalty, we 
can obtain a useful upper bound, reducing complexity. 
All convex quasiarithmetic penalties are flatter than the 
linear penalty. (There are some generalized quasiarith- 
metic convex coding penalties that are not flatter than the 
linear penalty — e.g., f(h,Pi) — hpf — and some flatter 
penalties that are not Campbell/quasiarithmetic — e.g., 
f(U,Pi) — 2 li (Pi + 0.1sin7rpi) — so no other similarly 
straightforward relation exists.) For most penalties we 
have considered, then, we can use the upper bounds in 
[23] or the results of a pre-algorithmic Huffman coding 
of the symbols to find an upper bound on codeword 
length. 

A problem in which pre-algorithmic Huffman coding 
would be useful is delay coding, in which the quadratic 
penalty Q is solved for 0{n 2 ) values of a and (3 
[6]. In this application, only one traditional Huffman 
coding would be necessary to find an upper bound for 
all quadratic cases. 

With other problems, we might wish to instead use 
a mathematically derived upper bound. Using the max- 
imum unary codeword length of n — 1 and techniques 
involving the Golden Mean, $ = Buro in [23] 

gives the upper limit of length for a (standard) binary 
Huffman codeword as 

$ + 1 



log<i 



, n — 1 



K Pn$ +Pn-1, 

which would thus be an upper limit on codeword length 
for the minimal optimal code obtained using any flatter 
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penalty function, such as a convex quasiarithmetic func- 
tion. This may be used to reduce complexity, especially 
in a case in which we encounter a flat class of problem 
inputs. 

In addition to this, one can improve this algorithm 
by adapting the binary length-limited Huffman coding 
techniques of Moffat (with others) in [30]-[34]. We do 
not explore these, however, as these cannot improve 
asymptotic results with the exception of a few special 
cases. Other approaches to length-limited Huffman cod- 
ing with improved algorithmic complexity [35], [36] are 
not suited for extension to nonlinear penalties. 

V. Conclusion 

With a similar approach to that taken by Shannon for 
Shannon entropy and Campbell for Renyi entropy, one 
can show redundancy bounds and related properties for 
optimal codes using Campbell's quasiarithmetic penal- 
ties and generalized entropies. For convex quasiarith- 
metic costs, building upon and refining Larmore and 
Hirschberg's methods, one can construct efficient algo- 
rithms for finding an optimal code. Such algorithms can 
be readily extended to the generalized quasiarithmetic 
convex class of penalties, as well as to the delay penalty, 
the latter of which results in more quickly finding an 
optimal code for delay channels. 

One might ask whether the aforementioned properties 
can be extended; for example, can improved redundancy 
bounds similar to [37]-[40] be found? It is an intriguing 
question, albeit one that seems rather difficult to answer 
given that such general penalties lack a Huffman coding 
tree structure. In addition, although we know that optimal 
codes for infinite alphabets exist given the aforemen- 
tioned conditions, we do not know how to find them. 
This, as with many infinite alphabet coding problems, 
remains open. 
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It would also be interesting if the algorithms could 
be extended to other penalties, especially since complex 
models of queueing can lead to other penalties aside 
from the delay penalty mentioned here. Also, note that 
the monotonicity property of the examples we consider 
implies that the resulting optimal code can be alphabetic, 
that is, lexicographically ordered by item number. If we 
desire items to be in a lexicographical order different 
from that of probability, however, the alphabetic and 
nonalphabetic cases can have different solutions. This 
was discussed for the length-limited penalty in [28]; it 
might be of interest to generalize it to other penalties 
using similar techniques and to prove properties of 
alphabetic codes for such penalties. 
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Appendix 
The Package-Merge Algorithm 

Here we illustrate and prove the correctness of a re- 
cursive version of Package-Merge algorithm for solving 
the Coin Collector's problem. This algorithm was first 
presented in [14], which also has a linear-time iterative 
implementation. 
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Restating the Coin Collector's problem: 



Minimize {B c{i,...,m}} Eies Mi 



subject to 
where 



J2teB Pi = l 
to G Z + 

ft G 2 Z 
f 6 K + 



(20) 



In our notation, we use i E {1, . . . , to} to denote both 
the index of a coin and the coin itself, and X to represent 
the to items along with their weights {/i,} and widths 
{pi}- The optimal solution, a function of total width t 
and items X, is denoted CC(X, i). 

Note that we assume the solution exists but might not 
be unique. In the case of distinct solutions, tie resolution 
for minimizing arguments may for now be arbitrary 
or rule-based; we clarify this in Subsection I1V-DI A 
modified version of the algorithm considers the case 
where a solution might not exist, but this is not needed 
here. Because a solution exists, assuming t > 0, t — 
w£ P ow f° r some unique odd weZ and t pow £ 2 Z . (Note 
that tpow need not be an integer. If t = 0, ui and t pow 
are not defined.) 

Algorithm variables 

At any point in the algorithm, given nontrivial X and t, 
we use the following definitions: 



Remainder 

tpow 

Minimum width 

P* 

Small width set 

I* 

"First" item 

i* 

"Second" item 



= the unique x £ 2 such 
that - is an odd integer 

= minjgrr pi (note p* e 2 Z ) 

- {* I Pi = P*} 
(by definition, \T* \ > 1) 

- ar S min 4e i* Pt 

i" = arg min ieI »\ {i , } m 

(or null A if \T* \ = 1) 

Then the following is a recursive description of the 
algorithm: 

Recursive Package-Merge Procedure [14] 

Basis, t = 0: CC(X, t) is the empty set. 



Case 1. 



= t 



pow 



and X 7^ 



• CC(X,t) = 
1: CC{X,t) = 



CC(T\{i*},t-p*)U{i*}. 

Case 2a. p* < t pow , 1^0, and \X* 
CC(X\{i*},t). 

Case 2b. p* < t pow , X ^ 0, and \X*\ > 1: Create 
i', a new item with weight fj^i = + /Zj«* and width 
Pi' = Pi* +Pi** = 2p*. This new item is thus a combined 
item, or package, formed by combining items i* and 
£**. Let S' = CC(X\{i*,i**}U{i'},t) (the optimization 
of the packaged version). If i' E S', then CC(X,t) = 
S'\{i'}U{i*,i**}; otherwise, CC(X,t) = S'. 

Theorem 6: If an optimal solution to the Coin Col- 
lector's problem exists, the above recursive (Package- 
Merge) algorithm will terminate with an optimal solu- 
tion. 

Proof: We show that the Package-Merge algorithm 
produces an optimal solution via induction on the depth 
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of the recursion. The basis is trivially correct, and each 
inductive case reduces the number of items by one. 
The inductive hypothesis on t > and X ^ is that 
the algorithm is correct for any problem instance that 
requires fewer recursive calls than instance (I,t). 

If J = and t ^ 0, or if p* > t pow > 0, then 
there is no solution to the problem, contrary to our 
assumption. Thus all feasible cases are covered by those 
given in the procedure. Case 1 indicates that the solution 
must contain an odd number of elements of width p*. 
These must include the minimum weight item in I*, 
since otherwise we could substitute one of the items 
with this "first" item and achieve improvement. Case 2 
indicates that the solution must contain an even number 
of elements of width p* . If this number is 0, neither i* 
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nor i** is in the solution. If it is not, then they both are. 
If i** = A, the number is 0, and we have Case 2a. If not, 
we may "package" the items, considering the replaced 
package as one item, as in Case 2b. Thus the inductive 
hypothesis holds and the algorithm is correct. ■ 
Fig. 13 presents a simple example of this algorithm 
at work, finding minimum total weight items of total 
width t = 3 (or, in binary, 112). In the figure, item 
width represents numeric width and item area represents 
numeric weight. Initially, as shown in the top row, the 
minimum weight item with width py> = t pow = 1 is 
put into the solution set. Then, the remaining minimum 
width items are packaged into a merged item of width 
2 (IO2). Finally, the minimum weight item/package with 
width pi* = t pow — 2 is added to complete the solution 
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set, which is now of weight 6. The remaining packaged 
item is left out in this case; when the algorithm is used 
for coding, several items are usually left out of the 
optimal set. 
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